PharmSci 175/275 (UCI)

What is this??

The material below is Lecture 2 (on Energy Minimization) from Drug Discovery Computing Techniques, PharmSci 175/275 at UC Irvine. Extensive materials for this course, as well as extensive background and related materials, are available on the course GitHub repository: github.com/mobleylab/drug-computing

This material is a set of slides intended for presentation with RISE as detailed in the course materials on GitHub. While it may be useful without RISE, it will also likely appear somewhat less verbose than it would if it were intended for use in written form.

Energy landscapes and energy minimization

Today: Energy landscapes, energy functions and energy mimimization

Instructor: David L. Mobley

Contributors to today's materials:

Energy landscapes provide a useful conceptual device

Energy landscapes govern conformations and flexibility

Chemistry takes place on energy landscapes

Energy landscapes govern dynamics and thermodynamics

Often, we are interested in exploring the energy landscape

A potential, $U(\bf{r^N})$ describes the energy as a function of the coordinates of the particles; here ${\bf r}$ is the coordinates, boldface denotes it is a vector, and the superscript denotes it is coordinates of all $N$ particles in the system.

Landscapes have many features, including

We can't visualize 3N dimensions, so we often project onto fewer dimensions

Background: Vector notation

A concrete example

Imagine we have two particles with particle 1 having coordinates $x_1 = 1$, $y_1 = 3$, and $z_1 = 2$, and particle 2 having coordinates $x_2 = -1$, $y_2 = 3$, $z_2 = -1$. That would give us an array like this: \begin{equation} {\bf r}^N= \begin{bmatrix} 1 & 3 & 2 \\ -1 & 3 & -1 \\ \end{bmatrix} \end{equation}

In Python, we'd store that as a numpy array

We could compute the distance between particles as $d = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2 + (z_1-z_2)^2}$

You could code that up directly in Python, or you can use array operations:

Let's let this sink in for a minute

We're using the notation $ {\bf r}^N$ as shorthand to refer to the x, y, and z positions of all of the particles in a system. This is actually an array, or a matrix, of particle coordinates:

\begin{equation} {\bf r}^N= \begin{bmatrix} 1 & 3 & 2 \\ -1 & 3 & -1 \\ \end{bmatrix} \end{equation}

Each row of that matrix has the x, y, and z coordinates of an atom in the system. *We will use this concept heavily in the Energy Minimization assignment.

Forces are properties of energy landscapes, too

The force is the slope (technically, gradient):

$f_{x,i} = -\frac{\partial U({\bf r^N})}{\partial x_i}$, $f_{y,i} = -\frac{\partial U({\bf r^N})}{\partial y_i}$, $f_{z,i} = -\frac{\partial U({\bf r^N})}{\partial z_i}$

As shorthand, this may be written ${\bf f}^N = -\frac{\partial U({\bf r^N})}{\partial {\bf r^N}}$ or ${\bf f}^N = -\nabla \cdot U({\bf r^N})$ where the result, ${\bf f}^N$, is an Nx3 array (matrix, if you prefer)

If energy function is pairwise additive, can evaluate via summing individual interactions -- force on atom k is \begin{equation} {\bf f}_k = \sum_{j\neq k} \frac{ {\bf r}_{kj}}{r_{kj}} \frac{\partial}{\partial r_{kj}} U(r_{kj}) \end{equation} where ${\bf r_{kj}} = {\bf r}_j - {\bf r_k}$. Note not all force calculations are necessary: ${\bf f}_{kj} = -{\bf f}_{jk}$

The matrix of second derivatives is called the Hessian and distinguishes minima from saddle points

\begin{equation} {\bf H}({\bf r}^N)= \begin{bmatrix} \frac{d^2 U({\bf r^N}) }{ d x_1^2} & \frac{d^2 U({\bf r^N}) }{ dx_1 d y_1} & ... & \frac{d^2 U({\bf r^N}) }{ dx_1 d z_1} \\ \frac{d^2 U({\bf r^N}) }{ d y_1 d x_1} & \frac{d^2 U({\bf r^N}) }{ d y_1^2} & ... & \frac{d^2 U({\bf r^N}) }{ dy_1 d z_N}\\ ... & ... & ... & ... \\ \frac{d^2 U({\bf r^N}) }{ d z_N d x_1} & \frac{d^2 U({\bf r^N}) }{ d z_N d y_1} & ... & \frac{d^2 U({\bf r^N}) }{ dz_N^2}\\ \end{bmatrix} \end{equation}

Types of stationary points can be distiguished from derivatives

Energy landscapes have lots of minima

For an Lennard-Jones 38 cluster:

See also Doye, Miller, and Wales, J. Chem. Phys. 111: 8417 (1999)

Note related "Python energy landscape explorer" (PELE)

(Image source https://pele-python.github.io/pele/disconnectivity_graph.html, CC-BY license, by the Pele authors: https://github.com/pele-python/pele/graphs/contributors)

Here's a bonus disconnectivity graph:

(Image source https://commons.wikimedia.org/wiki/File:Fedg.png#filelinks, by Snd0, CC-BY-SA 4.0)

We care a lot about finding minima

Findining minima often becomes a numerical task, because analytical solutions become impractical very quickly

Consider $U(x,y) = x^2 + (y-1)^2$ for a single particle in a two dimensional potential. Finding $\nabla\cdot U = 0$ yields:

$2x = 0$ and $2(y-1)=0$ or $x=0$, $y=1$

simple enough

But in general, N dimensions means N coupled, potentially nonlinear equations. Consider \begin{equation} U = x^2 z^2 +x (y-1)^2 + xyz + 14y + z^3 \end{equation} Setting the derivatives to zero yields:

$0 = 2xz^2 + (y-1)^2 +yz$

$0 = 2x(y-1) + xz + 14$

$0 = 2x^2z + xy + 3z^2$

Volunteers?? It can be solved, but not fun.

And this is just for a single particle in a 3D potential, so we are typically forced to numerical minimizations, even when potential is analytic

Energy minimization is a sub-class of the more general problem of finding roots

Common problem: For some $f(x)$, find values of $x$ for which $f(x)=0$

Many equations can be re-cast this way. i.e., if you need to solve $g(x) = 3$, define $f(x) = g(x)-3$ and find $x$ such that $f(x)=0$

If $f(x)$ happens to be the force, this maps to energy minimization

As a consequence: Algorithms used for energy minimization typically have broader application to finding roots

Let's check out a toy minimization problem to see how this would work

Here we'll set up a simple function to represent an energy landscape in 1D, and play with energy minimizing on that landscape.

Sandbox section

Try adjusting the above to explore what happens if you alter the starting conditions or the energy landscape or both. You might try:

Steepest descents is a simple minimization algorithm that always steps as far as possible along the direction of the force

Take ${\bf f}^N = -\frac{\partial U({\bf r}^N)}{\partial {\bf r}^N}$, then:

  1. Move in direction of last force until the minimum in that direction is found
  2. Compute new ${\bf f}_i^N $ for iteration $i$, perpendicular to previous force

Repeat until minimum is found

Limitations:

Oscillates in narrow valleys; slow near minimum.

Reminder: Here we're using vector notation for forces and positions

${\bf f}^N$ is the force on all of the atoms, as an array, where each row of the array is the force vector on that atom.

$U({\bf r}^N)$ is the potential energy, as a function of the positions of all of the atoms.

These use the same vector and array notation we introdued above for ${\bf r}^N$.

Steepest descents oscillates in narrow valleys and is slow near the minimum

(Illustration, P.A. Simonescu, Wikipedia, CC-BY-SA)

Another illustration further highlights this

(In this case, steepest ascents, but it's just the negative...)

(Images public domain: source, source)

A line search can make many minimization methods more efficient

A line search is an efficient way to find a minimum along a particular direction

To finish a line search, identify the minimum precisely

To do better than steepest descents, let's consider its pros and cons

SciPy does't implement steepest descents because of these issues

SciPy has lots of functions and tools for optimization, but it doesn't even implement steepest descents because of poor reliability. However, the Nelder-Mead minimization method applies a downhill simplex method which also is less than ideal, so let's play around with that a bit. First, let's make a suitable landscape:

Conjugate gradient (CG) works like steepest descent but chooses a different direction

Note that by ${\bf f}^N {\bf f}^N$ we mean vector multiplication, not a matrix multiplication; that is: \begin{equation} {\bf f}^N {\bf f}^N = f^2_{x,1} + f^2_{y,1} + f^2_{z,1} + f^2_{x,2} + ... + f^2_{z, N} \end{equation}

(In Python, this can be coded by multiplying ${\bf f}\times {\bf f}$, where ${\bf f}$ are arrays, and taking the sum of the result; for normal matrix multiplication of arrays one would use a dot product)

$\gamma_i$ is designed so that the new direction is conjugate to the old direction so that the new step does not undo any of the work done by the old step (causing oscillation)

Conjugate gradient (CG) is more efficient

Ideally takes at most $M$ steps, where $M$ is the number of degrees of freedom, often $3N$, though in practice even small precision errors make it take longer than this.

(Image: Wikipedia, public domain. Oleg Alexandrov.)

Let's look at a CG example

More advanced minimization methods can have considerable advantages

Locating global minima is challenging and often involves combination of techniques

When the number of dimensions is small, global minima can be found by repeated trials

One can simply do lots of minimizations from random starting points and find the global minimum sometimes, if there are not too many minima in total. See e.g. this local and global minima discussion.

As an exercise, you might try to find the global minimum of this function by picking random starting points and minimizing many times (for solutions, see that link):